library(tidyverse)
library(ggplot2)
library(gtsummary)
library(readxl)
library(broom)
library(DT)
library(summarytools)
library(patchwork)
library(GGally)
library(gganimate)
library(gifski)Data Visualization
1 
2 INTRODUCTION
2.1 PURPOSE
The purpose of this assignment is to provide data visualization analysis of the Zoonotic Malaria infection cases in Pahang state for the year of 2011-2022. the data visualization will give some insights on the epidemiological profile of Zoonotic Malaria infected individual and later will helps in improving the control and prevention action of Malaria in Pahang specifically
3 Overview of the Dataset
The dataset comprises information from 888 patients across 11 district in Pahang from 2011 to 2022. This hierarchial dataset consist of 2 levels of patients factors and also the districts (level2). the variables consist of :
- District (
Daerah): Identifies where the infection happened (11 district). - Age (
Umur): The age of the patient when diagnosed with the Zoonotic Malaria infection (in year). - Gender (
Jantina): The gender of the patient who diagnosed with the Zoonotic Malaria infection. - Citizenship (
Warganegara): Status of the infected patients whether he is Malaysian citizen (hold a legal document) or foreigner who works and live in Malaysia but didnot possess citizenship ID. - Forestry related work (
Pekerjaan): The jobscope of the patients whether related to forestry or not - Parasite density (
KepadatanParasit): Total number of Plasmodium parasite observed under the micrscope - Year (
Year): The year when the patient was diagnosed with Zoonotic Malaria (from 2011 to 2022) - Duration (
Duration_days): The duration from onset of symptoms to diagnosis. it might reflect delayed in diagnosis if more than 4 days.
4 INSTALLING PACKAGES AND LOADING LIBRARIES
4.1 READ DATASET
data1 <- read_excel("knowlesi.xlsx")
View(data1)4.2 Data wrangling
data1<-data1 %>% mutate_if(is.character,~ as_factor(.))data1$KepadatanParasit <- as.numeric(as.character(data1$KepadatanParasit))glimpse(data1)Rows: 888
Columns: 18
$ Daerah <fct> BENTONG, LIPIS, JERANTUT, LIPIS, RAUB, LIPIS, MARA…
$ Umur <dbl> 22, 71, 28, 36, 32, 24, 22, 16, 30, 50, 52, 25, 34…
$ Jantina <fct> MALE, MALE, MALE, MALE, MALE, MALE, MALE, MALE, FE…
$ Hamil <fct> NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO…
$ Bangsa <fct> INDIA, CINA, MELAYU, MELAYU, MELAYU, MELAYU, MELAY…
$ Warganegara <fct> YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA…
$ Pekerjaan <fct> FOREST RELATED, NON FOREST RELATED, NON FOREST REL…
$ Kawasan <fct> RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, R…
$ KepadatanParasit <dbl> 1200, 64000, 1500, 4640, 3040, 1120, 1800, 20000, …
$ dateNotifikasi <dttm> 2011-01-12, 2011-01-14, 2011-02-09, 2011-03-04, 2…
$ dateOnset <dttm> 2011-01-07, 2011-01-05, 2011-01-30, 2011-02-25, 2…
$ CaraPengesananKes <fct> PCD, PCD, PCD, PCD, PCD, PCD, PCD, PCD, PCD, PCD, …
$ KlasifikasiKes <fct> INDIGENOUS, INDIGENOUS, INDIGENOUS, INDIGENOUS, IN…
$ G6PD <fct> NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO…
$ PreliminaryDiagnose <fct> UNCOMPLICATED, UNCOMPLICATED, UNCOMPLICATED, UNCOM…
$ PernahHidap <fct> TIDAK, TIDAK, TIDAK, TIDAK, TIDAK, TIDAK, TIDAK, T…
$ Yearx <dttm> 2011-01-12, 2011-01-14, 2011-02-09, 2011-03-04, 2…
$ Year <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20…
4.3 Construct new meaningful variables (time to diagnosis)
data1 <- data1 %>% mutate(dur = as.duration(dateOnset %--% dateNotifikasi))data1 <- data1 %>%
mutate(duration_days = as.integer(abs(as.numeric(dur)) / 86400))summary(data1) Daerah Umur Jantina Hamil Bangsa
LIPIS :455 Min. : 0.00 MALE :742 NO :884 MELAYU :512
JERANTUT:121 1st Qu.:25.00 FEMALE:146 YES: 4 ORANG ASLI:127
MARAN : 75 Median :34.00 INDONESIA :105
RAUB : 71 Mean :37.02 CINA : 46
ROMPIN : 42 3rd Qu.:48.00 BANGLADESH: 37
TEMERLUH: 41 Max. :85.00 INDIA : 16
(Other) : 83 (Other) : 45
Warganegara Pekerjaan Kawasan KepadatanParasit
YA :716 FOREST RELATED :459 RURAL:797 Min. :1.600e+01
TIDAK:172 NON FOREST RELATED:429 URBAN: 91 1st Qu.:1.445e+03
Median :5.000e+03
Mean :1.929e+07
3rd Qu.:2.070e+04
Max. :4.598e+09
NA's :1
dateNotifikasi dateOnset
Min. :2011-01-12 00:00:00.00 Min. :2011-01-05 00:00:00.00
1st Qu.:2013-10-06 12:00:00.00 1st Qu.:2013-10-02 18:00:00.00
Median :2017-08-02 12:00:00.00 Median :2017-07-30 12:00:00.00
Mean :2017-02-12 11:16:12.97 Mean :2017-02-06 04:25:56.75
3rd Qu.:2020-03-05 06:00:00.00 3rd Qu.:2020-02-22 06:00:00.00
Max. :2022-12-30 00:00:00.00 Max. :2022-12-25 00:00:00.00
CaraPengesananKes KlasifikasiKes G6PD PreliminaryDiagnose
PCD:845 INDIGENOUS:888 NO :881 UNCOMPLICATED:707
ACD: 41 YES: 7 SEVERE :181
MBS: 2
PernahHidap Yearx Year
TIDAK:866 Min. :2011-01-12 00:00:00.00 Min. :2011
YA : 22 1st Qu.:2013-10-06 12:00:00.00 1st Qu.:2013
Median :2017-08-02 12:00:00.00 Median :2017
Mean :2017-02-12 11:16:12.97 Mean :2017
3rd Qu.:2020-03-05 06:00:00.00 3rd Qu.:2020
Max. :2022-12-30 00:00:00.00 Max. :2022
dur duration_days
Min. :-2246400s (~-3.71 weeks) Min. : 0.000
1st Qu.:345600s (~4 days) 1st Qu.: 4.000
Median :518400s (~6 days) Median : 6.000
Mean :543016.216216216s (~6.28 days) Mean : 6.981
3rd Qu.:691200s (~1.14 weeks) 3rd Qu.: 8.000
Max. :5788800s (~9.57 weeks) Max. :67.000
4.4 SELECT VARIABLES OF INTEREST
library(dplyr)
data2 <- data1 %>%
dplyr::select(Daerah, Umur, Jantina, Hamil, Bangsa, Warganegara, Pekerjaan, Kawasan, KepadatanParasit, KlasifikasiKes, Year, duration_days)
summary(data2) Daerah Umur Jantina Hamil Bangsa
LIPIS :455 Min. : 0.00 MALE :742 NO :884 MELAYU :512
JERANTUT:121 1st Qu.:25.00 FEMALE:146 YES: 4 ORANG ASLI:127
MARAN : 75 Median :34.00 INDONESIA :105
RAUB : 71 Mean :37.02 CINA : 46
ROMPIN : 42 3rd Qu.:48.00 BANGLADESH: 37
TEMERLUH: 41 Max. :85.00 INDIA : 16
(Other) : 83 (Other) : 45
Warganegara Pekerjaan Kawasan KepadatanParasit
YA :716 FOREST RELATED :459 RURAL:797 Min. :1.600e+01
TIDAK:172 NON FOREST RELATED:429 URBAN: 91 1st Qu.:1.445e+03
Median :5.000e+03
Mean :1.929e+07
3rd Qu.:2.070e+04
Max. :4.598e+09
NA's :1
KlasifikasiKes Year duration_days
INDIGENOUS:888 Min. :2011 Min. : 0.000
1st Qu.:2013 1st Qu.: 4.000
Median :2017 Median : 6.000
Mean :2017 Mean : 6.981
3rd Qu.:2020 3rd Qu.: 8.000
Max. :2022 Max. :67.000
glimpse(data2)Rows: 888
Columns: 12
$ Daerah <fct> BENTONG, LIPIS, JERANTUT, LIPIS, RAUB, LIPIS, MARAN, …
$ Umur <dbl> 22, 71, 28, 36, 32, 24, 22, 16, 30, 50, 52, 25, 34, 5…
$ Jantina <fct> MALE, MALE, MALE, MALE, MALE, MALE, MALE, MALE, FEMAL…
$ Hamil <fct> NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, NO, N…
$ Bangsa <fct> INDIA, CINA, MELAYU, MELAYU, MELAYU, MELAYU, MELAYU, …
$ Warganegara <fct> YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, YA, Y…
$ Pekerjaan <fct> FOREST RELATED, NON FOREST RELATED, NON FOREST RELATE…
$ Kawasan <fct> RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, RURAL, RURA…
$ KepadatanParasit <dbl> 1200, 64000, 1500, 4640, 3040, 1120, 1800, 20000, 568…
$ KlasifikasiKes <fct> INDIGENOUS, INDIGENOUS, INDIGENOUS, INDIGENOUS, INDIG…
$ Year <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
$ duration_days <int> 5, 9, 10, 7, 7, 8, 9, 8, 3, 5, 3, 5, 3, 9, 12, 7, 7, …
view(data2)5 DESCRIPTIVE TABLE
# Create the descriptive table
table_summary <- data2 %>%
tbl_summary(
by = Daerah,
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
) %>%
add_overall() %>%
modify_header(label ~ "**Variable**") %>%
modify_spanning_header(
all_stat_cols() ~ "**Summary Statistics**"
) %>%
modify_caption("**Sociodemographic characteristic of Zoonotic Malaria infected individual based on District**")
# Print the table
table_summary| Variable |
Summary Statistics
|
|||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Overall N = 8881 |
BENTONG N = 291 |
LIPIS N = 4551 |
JERANTUT N = 1211 |
RAUB N = 711 |
MARAN N = 751 |
ROMPIN N = 421 |
TEMERLUH N = 411 |
KUANTAN N = 321 |
BERA N = 171 |
C.HIGHLANDS N = 21 |
PEKAN N = 31 |
|
| Umur | 37 (16) | 40 (14) | 37 (17) | 37 (14) | 37 (15) | 33 (16) | 38 (17) | 39 (17) | 37 (14) | 33 (10) | 27 (3) | 30 (9) |
| Jantina | ||||||||||||
| MALE | 742 (84%) | 25 (86%) | 366 (80%) | 107 (88%) | 62 (87%) | 67 (89%) | 38 (90%) | 30 (73%) | 28 (88%) | 14 (82%) | 2 (100%) | 3 (100%) |
| FEMALE | 146 (16%) | 4 (14%) | 89 (20%) | 14 (12%) | 9 (13%) | 8 (11%) | 4 (9.5%) | 11 (27%) | 4 (13%) | 3 (18%) | 0 (0%) | 0 (0%) |
| Hamil | 4 (0.5%) | 0 (0%) | 3 (0.7%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (2.4%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Bangsa | ||||||||||||
| INDIA | 16 (1.8%) | 2 (6.9%) | 4 (0.9%) | 2 (1.7%) | 2 (2.8%) | 0 (0%) | 2 (4.8%) | 3 (7.3%) | 0 (0%) | 1 (5.9%) | 0 (0%) | 0 (0%) |
| CINA | 46 (5.2%) | 1 (3.4%) | 15 (3.3%) | 12 (9.9%) | 5 (7.0%) | 0 (0%) | 8 (19%) | 0 (0%) | 4 (13%) | 1 (5.9%) | 0 (0%) | 0 (0%) |
| MELAYU | 512 (58%) | 14 (48%) | 270 (59%) | 71 (59%) | 40 (56%) | 50 (67%) | 9 (21%) | 30 (73%) | 18 (56%) | 9 (53%) | 0 (0%) | 1 (33%) |
| ORANG ASLI | 127 (14%) | 4 (14%) | 59 (13%) | 9 (7.4%) | 11 (15%) | 13 (17%) | 19 (45%) | 5 (12%) | 1 (3.1%) | 3 (18%) | 2 (100%) | 1 (33%) |
| INDONESIA | 105 (12%) | 6 (21%) | 53 (12%) | 18 (15%) | 8 (11%) | 7 (9.3%) | 4 (9.5%) | 3 (7.3%) | 5 (16%) | 1 (5.9%) | 0 (0%) | 0 (0%) |
| NEPAL | 9 (1.0%) | 1 (3.4%) | 4 (0.9%) | 2 (1.7%) | 0 (0%) | 1 (1.3%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (5.9%) | 0 (0%) | 0 (0%) |
| CAMBODIA | 9 (1.0%) | 0 (0%) | 5 (1.1%) | 1 (0.8%) | 1 (1.4%) | 1 (1.3%) | 0 (0%) | 0 (0%) | 1 (3.1%) | 0 (0%) | 0 (0%) | 0 (0%) |
| MYANMAR | 12 (1.4%) | 0 (0%) | 8 (1.8%) | 2 (1.7%) | 2 (2.8%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| BANGLADESH | 37 (4.2%) | 0 (0%) | 28 (6.2%) | 2 (1.7%) | 2 (2.8%) | 2 (2.7%) | 0 (0%) | 0 (0%) | 1 (3.1%) | 1 (5.9%) | 0 (0%) | 1 (33%) |
| PAKISTAN | 4 (0.5%) | 1 (3.4%) | 3 (0.7%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| THAILAND | 1 (0.1%) | 0 (0%) | 1 (0.2%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| BUMIPUTRA SABAH | 4 (0.5%) | 0 (0%) | 2 (0.4%) | 2 (1.7%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| BUMIPUTRA SARAWAK | 1 (0.1%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (3.1%) | 0 (0%) | 0 (0%) | 0 (0%) |
| LAOS | 1 (0.1%) | 0 (0%) | 1 (0.2%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| CHINA | 4 (0.5%) | 0 (0%) | 2 (0.4%) | 0 (0%) | 0 (0%) | 1 (1.3%) | 0 (0%) | 0 (0%) | 1 (3.1%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Warganegara | ||||||||||||
| YA | 716 (81%) | 22 (76%) | 357 (78%) | 97 (80%) | 58 (82%) | 64 (85%) | 38 (90%) | 38 (93%) | 25 (78%) | 13 (76%) | 2 (100%) | 2 (67%) |
| TIDAK | 172 (19%) | 7 (24%) | 98 (22%) | 24 (20%) | 13 (18%) | 11 (15%) | 4 (9.5%) | 3 (7.3%) | 7 (22%) | 4 (24%) | 0 (0%) | 1 (33%) |
| Pekerjaan | ||||||||||||
| FOREST RELATED | 459 (52%) | 15 (52%) | 236 (52%) | 67 (55%) | 40 (56%) | 32 (43%) | 29 (69%) | 20 (49%) | 8 (25%) | 9 (53%) | 1 (50%) | 2 (67%) |
| NON FOREST RELATED | 429 (48%) | 14 (48%) | 219 (48%) | 54 (45%) | 31 (44%) | 43 (57%) | 13 (31%) | 21 (51%) | 24 (75%) | 8 (47%) | 1 (50%) | 1 (33%) |
| Kawasan | ||||||||||||
| RURAL | 797 (90%) | 28 (97%) | 413 (91%) | 108 (89%) | 58 (82%) | 72 (96%) | 37 (88%) | 36 (88%) | 25 (78%) | 15 (88%) | 2 (100%) | 3 (100%) |
| URBAN | 91 (10%) | 1 (3.4%) | 42 (9.2%) | 13 (11%) | 13 (18%) | 3 (4.0%) | 5 (12%) | 5 (12%) | 7 (22%) | 2 (12%) | 0 (0%) | 0 (0%) |
| KepadatanParasit | 19,290,477 (234,170,084) | 120,091,096 (636,247,620) | 16,767,834 (225,522,199) | 1,019,013 (4,545,677) | 568,745 (3,527,452) | 35,240,340 (302,971,560) | 8,872,412 (37,670,508) | 66,391,762 (417,791,045) | 4,554,367 (15,629,632) | 5,945 (8,555) | 8,860,528 (12,519,535) | 22,102 (20,337) |
| Unknown | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| KlasifikasiKes | ||||||||||||
| INDIGENOUS | 888 (100%) | 29 (100%) | 455 (100%) | 121 (100%) | 71 (100%) | 75 (100%) | 42 (100%) | 41 (100%) | 32 (100%) | 17 (100%) | 2 (100%) | 3 (100%) |
| Year | 2,016.6 (3.5) | 2,017.4 (3.7) | 2,016.4 (3.5) | 2,015.8 (3.4) | 2,017.7 (2.8) | 2,016.7 (3.3) | 2,017.5 (3.5) | 2,015.4 (3.3) | 2,018.7 (3.2) | 2,019.0 (2.1) | 2,018.0 (1.4) | 2,020.7 (2.3) |
| duration_days | 7 (5) | 8 (3) | 6 (5) | 7 (7) | 7 (4) | 8 (6) | 9 (5) | 8 (4) | 9 (5) | 9 (5) | 6 (1) | 6 (1) |
| 1 Mean (SD); n (%) | ||||||||||||
Comment:
The table provides a summary of the distribution of age, gender, citizenship, forestry related jobs, district profile , parasite density, and duration of onset to diagosis across 11 district in Pahang which includes 888 patients. The majority of patients is male (84%), malaysian citizen (81%), work in forestry related job (52%), came from rural area (90%). the mean age of patients was 37years old (SD=16), with the mean onset to diagnosis time was 7 days (SD = 5). The table shows that patients predominantly infected in Kuala Lipis (n=455).
6 DATA VISUALIZATION
6.1 LINE PLOT
The Line graph is used to visualize the distribution of the Zoonotic Malaria cases from 2011 to 2022, the visualization helps in identifying the trend of cases over the years.
geom_line Helps to visualize the overall trend or progression of malaria cases across time. the blue sets the line in blue colored and size=1 indicate the size of the line.
geom_point Highlights the exact values at each year, complementing the line for better clarity. the red color indicate the color of the point and size=2 is the size of the point.
# Define years and corresponding cases
Year <- 2011:2022
Cases <- c(36, 92, 118, 91, 34, 32, 74, 114, 69, 54, 100, 74)
# Create data frame
data_cases <- data.frame(Year, Cases)
# View the table
print(data_cases) Year Cases
1 2011 36
2 2012 92
3 2013 118
4 2014 91
5 2015 34
6 2016 32
7 2017 74
8 2018 114
9 2019 69
10 2020 54
11 2021 100
12 2022 74
ggplot(data_cases, aes(x = Year, y = Cases)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red", size = 2) +
labs(title = "Malaria Cases by Year",
x = "Year", y = "Number of Cases") +
theme_minimal()The line graph above illustrates the annual number of malaria cases across a span of years from 2011 to 2022. The trend shown is notably fluctuating, indicating considerable year-to-year variation in case numbers. Peaks are observed in 2013 and 2018, where malaria cases surpassed 110, suggesting potential outbreak periods or lapses in control efforts. These are followed by sharp declines in 2015 and 2016, indicating improved management or natural downturns in transmission. A secondary rise occurs in 2021, though it is slightly lower than the previous peaks, followed again by a reduction in 2022.
6.2 BAR PLOT
The bar plot is used to visualize the distribution of Zoonotic Malaria cases based for each district across the 10 year period. Besides, additional boxplot were constructed to visualise the comparison between the number of cases based on the gender and working nature across the year and district.This visualization helps identify the group with higher number of cases which suggest for control and prevention activity to be focus on individual with this specific background.
The ggplot2 package was used to construct the bar plot, employing the ggplot() function to specify the dataset and aesthetic mappings. The aes() function mapped the Year and District variable to the x-axis and the interaction between gender and working nature to the fill aesthetic. To create side-by-side bars the geom_bar() function was utilized. The geom_hline was used to create a horizontal line, which can be to visualize the threshold level.For clarity, the plot was customized with titles and labels using the labs() function to add a title and labels for the x-axis, y-axis, and fill legend. The theme_minimal() function was applied to give the plot a clean and simple appearance, while the scale_fill_manual() function was used to manually set the colors for the different fill categories, ensuring the plot is visually appealing and easy to interpret.
cong_dat <- data2 %>%
group_by(Year, Daerah) %>%
summarise(Status = n())cong_dat6.2.1 Cases across the district from 2011 to 2022
cases_malaria <- ggplot(data2, aes(x = Daerah)) +
geom_bar(fill = "steelblue") +
labs(title = "Cases across the district between 2011 to 2022",
x = "District",
y = "Count") +
theme_minimal()
cases_malariathe district of Lipis recorded the highest cummulative Zoonotic Malaria cases from 2011 to 2022, followed by Jerantut, Maran, Raub and others. the district with lowest case between 2011 to 2022 is the Cameron Highland district
6.2.2 Cases across the district for each year
ggplot(cong_dat, aes(x = Year, y = Status, fill = Daerah)) +
geom_bar(stat = "identity") +
geom_hline(yintercept = 10) +
scale_fill_manual(values = c("blue", "red", "orange", "yellow", "pink", "purple", "green", "brown","lightgreen", "lightgrey", "chartreuse2"))if comparing cases year by year basis, similar finding can be seen as Lipis recorded the highest number of cases every year, followed by Jerantut and others. The Jerantut district contributed a large portion in number of cases for early part of the cohort up until 2018. however for the last 3 years, other district , like Maran, Raub and kuantan has a comparable number of cases to Jerantut.
6.2.3 total number of cases between citizenship from 2011 to 2022
cong_dat2 <- data2 %>%
group_by(Year, Warganegara) %>%
summarise(Status = n())
ggplot(cong_dat2, aes(x = Year, y = Status, fill = Warganegara)) +
geom_bar(stat = "identity") +
geom_hline(yintercept = 30) +
scale_fill_manual(values = c("blue", "red"))when comparing between citizen and non-citizen cases, across the year, majority of the cases were among Malaysian citizen
6.2.4 comparison between gender across district and years
cong_dat3 <- data2 %>%
group_by(Year, Jantina) %>%
summarise(Status = n())
ggplot(cong_dat3, aes(x = Year, y = Status, fill = Jantina)) +
geom_bar(stat = "identity") +
geom_hline(yintercept = 30) +
scale_fill_manual(values = c("purple", "grey"))cong_dat4 <- data2 %>%
group_by(Daerah, Jantina) %>%
summarise(Status = n())
ggplot(cong_dat4, aes(x = Daerah, y = Status, fill = Jantina)) +
geom_bar(stat = "identity") +
geom_hline(yintercept = 30) +
scale_fill_manual(values = c("purple", "grey"))Majority of the cases were among Male. In Lipis district, the number of cases among female can be seen higher to other district which relatively having similar number of cases. female cases never reach 30 cases every year.
6.2.5 comparison between work nature of cases across district
cong_dat5 <- data2 %>%
group_by(Daerah, Pekerjaan) %>%
summarise(Status = n())
ggplot(cong_dat5, aes(x = Daerah, y = Status, fill = Pekerjaan)) +
geom_bar(stat = "identity") +
geom_hline(yintercept = 30) +
scale_fill_manual(values = c("darkgreen", "darkred"))there is not much different between number of cases when comparing the job nature of the patients across the district.
6.3 BOXPLOT
box_plot1 <- ggplot(data2, aes(x = Daerah, y = Umur, fill = Daerah)) +
geom_boxplot() +
labs(title = "Box Plot of age of zoonotic malaria cases between district from 2011 to 2022",
x = "District",
y = "Age") +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
box_plot1The figure is a boxplot which illustrate the distribution of age across the 11 districts.the median age for patients with Zoonotic Malaria for each districts were in between 20-40 which indicate young adult. the range of age were approximately similar between district except Bera, Cameron Highland and Pekan
6.4 SCATTER PLOT
scatter_plot1 <- ggplot(data2, aes(x = duration_days, y = KepadatanParasit, color = duration_days)) +
geom_point(size = 3, alpha = 0.7) +
labs(
title = "Duration from Onset to Diagnosis vs. Parasite Count",
x = "Duration (days)",
y = "Parasite Count"
) +
scale_color_gradient(low = "blue", high = "red") +
theme_minimal()
scatter_plot1The scatterplot illustrates the relationship between the duration from symptom onset to diagnosis (in days) and the parasite count among patients. Notably, the wide range in parasite counts, spanning several orders of magnitude, has resulted in a highly skewed distribution. This skewness hampers the visualization of cases with relatively low parasite counts, which appear compressed near the lower portion of the y-axis.
Moreover, the data points are predominantly clustered along the lower axis, reflecting a concentration of cases with low parasite density across varying durations. This pattern, combined with the absence of a discernible upward or downward trend, suggests a weak or negligible correlation between time to diagnosis and parasite load. The observed imbalance in data distribution reinforces the likelihood that delay in diagnosis is not a strong predictor of parasite burden in this cohort.
7 Recommendation
control and prevention action should be given priority in the area with high burden of cases especially Kuala Lipis.
Awareness on Zoonotic Malaria infection should be targeted to young male adult and local citizen of Malaysia irrespective of their working nature as data showed they are the most vulnerable group for the Zoonotic Malaria infection.
8 Animation
To make the visualisation different and interactive, the graph and plots can be transform into animation
knitr::include_graphics("malaria_cases.gif")knitr::include_graphics("malaria_gender.gif")9 References
- https://www.coursera.org/learn/jhu-advanced-data-visualization-r
- https://posit-connect.kk.usm.my/content/8f474ac1-9027-479e-bdf4-b6b8d6083bab/Data%20Visualization%20Assignment.html